Compatible Reward Inverse Reinforcement Learning

نویسندگان

Alberto Maria Metelli

Matteo Pirotta

Marcello Restelli

چکیده

PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? CONTRIBUTIONS 1. We propose the Compatible Reward Inverse Reinforcement Learning (CR-IRL): • CR-IRL is model-free since it requires solely a set of expert’s demonstrations; • CR-IRL performs both feature extraction and reward selection. 2. We provide empirical results to show that the rewards recovered by CRIRL allow learning the optimal policy faster than the original reward function.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonli...

متن کامل

Inverse Optimal Control

In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing a reward function for a given learning task is often non trivial. Inverse Reinforcement Learning, which is sometimes also called Inverse Optimal Control, addresses this problem by learning the reward function from expert demonstrations. The aim of this paper is to give a brief introduc...

متن کامل

The Use of Apprenticeship Learning Via Inverse Reinforcement Learning for Generating Melodies

The research presented in this paper uses apprenticeship learning via inverse reinforcement learning to ascertain a reward function in a musical context. The learning agent then used this reward function to generate new melodies using reinforcement learning. Reinforcement learning is a type of unsupervised machine learning where rewards are used to guide an agent’s learning. These rewards are u...

متن کامل

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectori...

متن کامل

Inverse Reinforcement Learning based on Critical State

Inverse reinforcement learning is tried to search a reward function based on Markov Decision Process. In the IRL topics, experts produce some good traces to make agents learn and adjust the reward function. But the function is difficult to set in some complicate problems. In this paper, Inverse Reinforcement Learning based on Critical State (IRLCS) is proposed to search a succinct and meaningfu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Compatible Reward Inverse Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

Inverse Optimal Control

The Use of Apprenticeship Learning Via Inverse Reinforcement Learning for Generating Melodies

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Inverse Reinforcement Learning based on Critical State

عنوان ژورنال:

اشتراک گذاری